john lin
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Kim, Jiho, Choi, Junseong, Chay, Woosog, Kyung, Daeun, Kwon, Yeonsu, Jo, Yohan, Choi, Edward
As large language models (LLMs) become increasingly integrated into daily life, there is growing demand for AI assistants that are not only reactive but also proactive and personalized. While recent advances have pushed forward proactivity and personalization individually, their combination remains underexplored. To bridge this gap, we introduce ProPerSim, a new task and simulation framework for developing assistants capable of making timely, personalized recommendations in realistic home scenarios. In our simulation environment, a user agent with a rich persona interacts with the assistant, providing ratings on how well each suggestion aligns with its preferences and context. The assistant's goal is to use these ratings to learn and adapt to achieve higher scores over time. Built on ProPerSim, we propose ProPerAssistant, a retrieval-augmented, preference-aligned assistant that continually learns and adapts through user feedback. Experiments across 32 diverse personas show that ProPerAssistant adapts its strategy and steadily improves user satisfaction, highlighting the promise of uniting proactivity and personalization.
Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues
Chu, KuanChao, Chen, Yi-Pei, Nakayama, Hideki
This paper investigates the quality of multi-agent dialogues in simulations powered by Large Language Models (LLMs), focusing on a case study from Park et al. (2023), where 25 agents engage in day-long simulations of life, showcasing complex behaviors and interactions. Analyzing dialogues and memory over multiple sessions revealed significant issues such as repetition, inconsistency, and hallucination, exacerbated by the propagation of erroneous information. To combat these challenges, we propose a novel Screening, Diagnosis, and Regeneration (SDR) framework that detects and corrects utterance errors through a comprehensive process involving immediate issue identification, evidence gathering from past dialogues, and LLM analysis for utterance revision. The effectiveness of the SDR framework is validated through GPT-4 assessments and human evaluations, demonstrating marked improvements in dialogue consistency, diversity, and the reduction of false information. This work presents a pioneering approach to enhancing dialogue quality in multi-agent simulations, establishing a new standard for future research in the field.
A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization
Chu, KuanChao, Chen, Yi-Pei, Nakayama, Hideki
This research investigates prompt designs of evaluating generated texts using large language models (LLMs). While LLMs are increasingly used for scoring various inputs, creating effective prompts for open-ended text evaluation remains challenging due to model sensitivity and subjectivity in evaluation of text generation. Our study experimented with different prompt structures, altering the sequence of output instructions and including explanatory reasons. We found that the order of presenting reasons and scores significantly influences LLMs' scoring, with a different level of rule understanding in the prompt. An additional optimization may enhance scoring alignment if sufficient data is available. This insight is crucial for improving the accuracy and consistency of LLM-based evaluations.
Humanoid Agents: Platform for Simulating Human-like Generative Agents
Wang, Zhilin, Chiu, Yu Ying, Chiu, Yu Cheung
Just as computational simulations of atoms, molecules and cells have shaped the way we study the sciences, true-to-life simulations of human-like agents can be valuable tools for studying human behavior. We propose Humanoid Agents, a system that guides Generative Agents to behave more like humans by introducing three elements of System 1 processing: Basic needs (e.g. hunger, health and energy), Emotion and Closeness in Relationships. Humanoid Agents are able to use these dynamic elements to adapt their daily activities and conversations with other agents, as supported with empirical experiments. Our system is designed to be extensible to various settings, three of which we demonstrate, as well as to other elements influencing human behavior (e.g. empathy, moral values and cultural background). Our platform also includes a Unity WebGL game interface for visualization and an interactive analytics dashboard to show agent statuses over time. Our platform is available on https://www.humanoidagents.com/ and code is on https://github.com/HumanoidAgents/HumanoidAgents
Surprising things happen when you put 25 AI agents together in an RPG town
A group of researchers at Stanford University and Google have created a miniature RPG-style virtual world similar to The Sims, where 25 characters, controlled by ChatGPT and custom code, live out their lives independently with a high degree of realistic behavior. They wrote about their experiment in a preprint academic paper released on Friday. "Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day," write the researchers in their paper, "Generative Agents: Interactive Simulacra of Human Behavior." To pull this off, the researchers relied heavily on a large language model (LLM) for social interaction, specifically the ChatGPT API. In addition, they created an architecture that simulates minds with memories and experiences, then let the agents loose in the world to interact.
Artificial Intelligence Takes Over Smallville: Bots Throw Party at Local Bar - MetaTech
A team of researchers from Google and Stanford University recently conducted an intriguing experiment, wherein they created a virtual town for 25 AI "agents" to inhabit. The study, titled "Generative Agents: Interactive Simulacra of Human Behavior," aimed to explore the extent to which AI could mimic human behavior in a simulated environment, inspired by life-simulation games like The Sims. The researchers developed a town named "Smallville," populated it with ChatGPT-trained generative agents, and observed how they went about their day-to-day activities. A bird's-eye view of Smallville, which consists of houses, a park, a bar, a shopping center, a pharmacy and a college. The agents exhibited a remarkable degree of human-like behavior, with the ability to make inferences, store information in memory, and then behave accordingly.
Researchers populated a tiny virtual town with AI (and it was very wholesome)
What would happen if you filled a virtual town with AIs and set them loose? As it turns out, they brush their teeth and are very nice to one another! But this unexciting outcome is good news for the researchers who did it, since they wanted to produce "believable simulacra of human behavior" and got just that. The paper describing the experiment, by Stanford and Google researchers, has not been peer reviewed or accepted for publication anywhere, but it makes for interesting reading nonetheless. The idea was to see if they could apply the latest advances in machine learning models to produce "generative agents" that take in their circumstances and output a realistic action in response. And that's very much what they got.